Newest 'neural-network gradient-descent' Questions

2votes

1answer

91views

Why are the second-order derivatives of a loss function nonzero when linear combinations are involved?

I'm working on implementing Newton's method to perform second-order gradient descent in a neural network and having trouble computing the second order derivatives. I understand that in practice, ...

bsluther

21

asked Nov 10, 2024 at 17:42

1vote

1answer

102views

In a Computational Graph, how to calculate the total upstream gradient of a node with multiple upstreams?

Given a Computation Graph with a node (like the one below), I understand that I can use the upstream gradient dL/dz to calculate all of my downstream gradients. But what if there are multiple ...

Ibrahim

111

asked Jan 22, 2024 at 0:00

1vote

2answers

399views

Gradient Descent: Is the magnitude in Gradient Vectors arbitrary?

I am only just getting familiar with gradient descent through learning logistic regression. I understand the directional component in the gradient vectors is correct information derived from the slope ...

MrHunda

11

asked Aug 12, 2023 at 13:38

2votes

1answer

544views

Gradients of lower layers of NN when gradient of an upper layer is 0?

Say we have a neural network with an input layer, a hidden layer and an output layer. Say the gradients with respect to the weights and biases of the output layer are all 0. Then, by backpropagation ...

VJ123

147

asked Jul 23, 2023 at 12:38

1vote

1answer

76views

Doubt in gradient , vanishing gradient problem in Back propagation

As per my knowledge, in back propagation- loss function or gradient is used to update the weights. in back propagation, weights became small w.r.t gradients, this leads to vanishing gradient problem. ...

tovijayak

67

asked Jun 24, 2023 at 2:20

5votes

2answers

10kviews

What exactly is Gradient norm?

I found that there is no common resource and well defined definition for "Gradient norm", most search results are based on ML experts providing answers which involves gradient norm or papers ...

StudentV

53

asked May 27, 2023 at 4:35

0votes

1answer

227views

Affine layer - gradient shape

In course cs231n, I need to implement backward pass computation for an affine (linear) layer: ...

Ben Lahav

1

asked Apr 17, 2023 at 13:01

0votes

1answer

127views

GAN Generator Backpropagation Gradient Shape Doesn't Match

In the TensorFlow example (https://www.tensorflow.org/tutorials/generative/dcgan#the_discriminator) the discriminator has a single output neuron (assume batch_size=1). Then over in the training loop ...

rkuang25

3

asked Jan 28, 2023 at 10:14

0votes

0answers

116views

Why backpropagation is done in every epoch when loss is always scalar?

I understand the backpropagation algorithm that it calculates the derivate of loss with respect to all the parameters in the neural network. My question is this derivate is constant right because the ...

Jeet

101

asked Nov 20, 2022 at 9:46

2votes

0answers

142views

Can I find the input that maximises the output of a Neural Network?

So I trained a 2 layer Neural Network for a regression problem that takes $D$ features $(x_1,...,x_D)$ and outputs a real value $y$. With the model already trained (weights optimised, fixed), can I ...

puradrogasincortar

123

asked Nov 8, 2022 at 14:01

0votes

0answers

657views

Proof that averaging weights is equal to averaging gradients (FedSGD vs FedAvg)

The first paper of Federated Learning "Communication-Efficient Learning of Deep Networks from Decentralized Data" presents FedSGD and FedAvg. In Federated Learning the learning task is ...

CasellaJr

229

asked Sep 23, 2022 at 11:35

0votes

0answers

155views

calculating derivative of bias in backpropagation

Looking at the algorithm in wikipedia, we can implement backpropagation by calculating: $$\delta^{L}=\left(f^{L}\right)'\cdot\nabla_{a^{L}}C$$ (where I treat $\left(f^{L}\right)'$ as an $n\times n$ ...

Ariel Yael

101

asked Jul 11, 2022 at 16:51

2votes

1answer

874views

How does gradient descent avoid local minimums?

In Neural Networks and Deep Learning, the gradient descent algorithm is described as going on the opposite direction of the gradient. Link to place in book. What prevents this strategy from landing in ...

Foobar

125

asked Jun 19, 2022 at 18:23

1vote

1answer

4kviews

how to calculate loss function?

i hope you are doing well , i want to ask a question regarding loss function in a neural network i know that the loss function is calculated for each data point in the training set , and then the ...

imene

23

asked May 25, 2022 at 15:10

1vote

0answers

55views

How to interpret integrated gradients in an NLP toxic text classification use-case?

I am trying to understand how integrated gradients work in the NLP case. Let $F: \mathbb{R}^{n} \rightarrow[0,1]$ a function representing a neural network, $x \in \mathbb{R}^{n}$ an input and $x' \in ...

Revolucion for Monica

381

asked Apr 9, 2022 at 14:30

Stack Exchange Network

All Questions

Why are the second-order derivatives of a loss function nonzero when linear combinations are involved?

In a Computational Graph, how to calculate the total upstream gradient of a node with multiple upstreams?

Gradient Descent: Is the magnitude in Gradient Vectors arbitrary?

Gradients of lower layers of NN when gradient of an upper layer is 0?

Doubt in gradient , vanishing gradient problem in Back propagation

What exactly is Gradient norm?

Affine layer - gradient shape

GAN Generator Backpropagation Gradient Shape Doesn't Match

Why backpropagation is done in every epoch when loss is always scalar?

Can I find the input that maximises the output of a Neural Network?

Proof that averaging weights is equal to averaging gradients (FedSGD vs FedAvg)

calculating derivative of bias in backpropagation

How does gradient descent avoid local minimums?

how to calculate loss function?

How to interpret integrated gradients in an NLP toxic text classification use-case?

Hot Network Questions

All Questions

Related Tags